IN THE CLAIMS: 



1 . (Currently Amended) A method comprising: 

receiving a first instruction, the first, ins [ruction of an instruction format 
comprising a [[a]] first field to indicate a first operand having a first plurality of data 
elements including at least A I, A2, A3, and A4 as data elements, and a second field to 
indicate a second operand having a second plurality of data elements including at least 
Bl, B2, B3, and B4 as data elements, each of the data elements of the first and second 
pluralities of data elements having a length of N bits; and 

storing, in an architecturally visible destination operand, a packed data having a 
length of at least 4N hits in response to said first instruction, by performing the operation 
(A1 x Bl) + (A2 x B2) to generate a first data element of the packed data, and performing 
the operation (A3 x B3) ■■>■ (A4 x B4) to generate a second data element of the packed 
data, each of the first and second data elements having a length of at least 2N bits. 

2. (Original) The method of Claim I wherein N is .16. 

3. (Original ) The method of Claim I , said first plurality of data elements further 
including at least A5, Ag, A7, and Ag as data elements, and said second plurality of data 
elements further including at least B5, B$, B7, and Bg as data elements, the method 
further comprising: 

storing, in the architecturally visible destination operand, said packed data having 
a length of at least 8N bits in response to said first instruction, fay performing the 
operation (A5 x B5} * (Ag x B5) to generate a third data element of the packed data, and 



2 



performing the operation (A.7 x B7) + (Ag x Bg) to generate a fourth data element of the 
packed data, each of the first, second, third and fourth data elements having a length of at 
least 2N hits, 

4. (Original) The method of Claim 3 wherein N is 8. 

5. (Original ) The method of Claim 4 wherein said first plurality of data elements 
are treated as unsigned bytes. 

6. (Original) The method of Claim 5 wherein said second plurality of data 
elements are treated as signed bytes. 

7. (Original) The method of Claim 6 wherein each of said first, second, third 
and fourth data elements are generated using signed saturation. 

8. (Original) An apparatus to perform the method of Claim 7 comprising: 
at least one state machine; and 

a machine-accessible medium including data that, when accessed by said at least 
one state machine, causes said at least one state machine to perform the method of 
Claim 7. 

9 (Original) The method of Claim 4 further comprising: 

storing, in the architecturally visible destination operand, said packed data having 
a length of at least 1 6N bits in response to said first instruction. 
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1 0. (Originai) An apparatus to perform the method of Claim 4 comprising: 
an execution unit; and 

a machine-accessible medium including data that, when accessed by said 
execution unit, causes the execution unit to perform the method of Cl aim 4. 

1 1 . (Original) The method of Claim 4 wherein said first field comprises bits fi ve 
through three of the instruction format. 

1.2. (Original) The method of Claim 1 1 wherein said second field comprises bits 
two through zero of the instruction format. 

1.3 . (Original) The method of Claim .12 wherein said architecturally visible 
destination operand is indicated by said first field of the instruction format. 

14. (Original) An apparatus comprising; 

a first input to receive a first packed data comprising at least four data elements 
a second input to receive a second packed data comprising at least four data 

elements; 

a multiply-adder circuit, responsive to a first instruction, to multiply a first pair of 
data elements of the first packed data by respective data elements of the second packed 
data and to generate a first result representing a first sum of products of said 
multiplications of said respective data elements with said first pair of data elements, and 
to multiply a second pair of data elements of the first packed data by respective data 
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elements of the second packed data and to generate a second result representing a second 
sum of products of said multiplications of said respective data elements with said second 
pair of data elements; and 

an oupufc to store a third packed data comprising at least said first and said results 
in response to the first instruction. 

1 5. (Original) The apparatus of Claim 14 wherein said first and second packed 
data each contain at least eight data elements. 

1.6. (Original ) The apparatus of Claim 1 5 wherein said first and second packed 
data each contain at least 64-bits of packed data. 

17. (Original) The apparatus of Claim .1 5 wherein said first ami second packed 
data each contain at least 128-bits of packed data. 

18. (Original ) The apparatus of Claim 1 7 wherein said first and second packed 
data each contain at least sixteen data elements. 

1 9. (Original ) The apparatus of Claim 17 wherein the first packed data comprises 
unsigned data elements. 

20. (Original) The apparatus of Claim 17 wherein the second packed data 
comprises signed data elements. 
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2 1 . (Ori ginal ) The apparatus of Claim 20 wherein the fi rst packed data compri ses 
unsigned data elements. 

22. (Original) The apparatus of Claim 21 wherein the first and second results are 
generated using signed saturation. 

23. (Original) The apparatus of Claim 14 wherein the first and second results are 
truncated. 

24. (Original) A computing system comprising: 
an addressable memory to store data; 

a processor including: 

a first storage area to store M packed data elements, the first, storage area 
corresponding to a first N-bit source; 

a second storage area to store M packed data elements, the second storage 
area corresponding to a second N-bit source; 

a decoder to decode a first set of one or more instruction formats having a 
first field to specify the first N-bit source and a second field to specify the second N-bit 
source; 

an execution unit, responsive to the decoder decoding a first instruction of 
the first set of one or more instruction formats, to produce M products of multiplication 
of the packed data elements stored in the first storage area by corresponding packed data 
elements stored in the second storage area, and to sum the M products of multiplication 
pairwise to produce M/2 results representing M/2 sums of products; and 
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a third storage area to store M/2 packed data elements, the third storage 
area corresponding to a N-hit destination specified by the first instruction to store the 
M/2 re suits; and 

a magnetic storage device to store said first instruction. 

25. (Original) The computing system of Claim 24 wherein N is 128. 

26. (Original) The computing system of Claim 25 wherein M is 16, 

27. (Original) The computing system of Claim 24 wherein N is 64. 

28. (Original) The computing system of Claim 28 wherein M is 8. 

29. (Original) The computing system of Claim 28 wherein said M packed data 
elements of the first storage area are treated as unsigned bytes. 

30. (Original) The computing system of Claim 29 wherein said M packed data 
elements of the second storage area are treated as signed bytes. 

3 1 . (Original) The computi tig system of CI aim 30 wherein each of said M/2 
results are generated using signed saturation. 
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